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Abstract 1 

Many approaches to Bayesian image segmentation have used maximum a posteriori (MAP) 
estimation in conjunction with Markov random fields (MRF). While this approach performs well, 
it has a number of disadvantages. In particular, exact MAP estimates cannot be computed, ap- 
proximate MAP estimates are computationally expensive to compute, and unsupervised parameter 
estimation of the MRF is difficult. 

In this paper, we propose a new approach to Bayesian image segmentation which directly 
addresses these problems. The new method replaces the MRF model with a novel multiscale 
random field (MSRF), and replaces the MAP estimator with a sequential MAP (SMAP) estimator 
derived from a novel estimation criteria. Together, the proposed estimator and model result in a 
segmentation algorithm which is not iterative and can be computed in time proportional to MN 
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where M is the number of classes and N is the number of pixels. We also develop a computationally 
efficient method for unsupervised estimation of model parameters. 

Simulations on synthetic images indicate that the new algorithm performs better and requires 
much less computation than MAP estimation using simulated annealing. The algorithm is also 
found to improve classification accuracy when applied to the segmentation of multispectral remotely 
sensed images with ground truth data. 

IP EDICS #1.6 Image Processing: Multiresolution Processing 
or 

IP EDICS #1.5 Image Processing: Segmentation 

f This work was supported by the US Army Construction Engineering Research Laboratory grant numbei 
DACA8890D0029, and an NEC faculty Fellowship. 

IEEE Trans, on Image Processing, vol. 3, no. 2, pp. 162-177, March 1994. 

1 This manuscript appeared as: C. A. Bouman and M. Shapiro, "A Multiscale Random Field Model for 
Bayesian Image Segmentation," IEEE Trans, on Image Processing, vol. 3, no. 2, pp. 162-177, March 1994. 
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1 Introduction 

Haralick and Shapiro have suggested that a good segmentation of a image should separate the 
image into simple regions with homogeneous behavior [1]. In recent years, many authors have used 
Bayesian estimation techniques as a framework for computing segmentations which best compromis 
between these two opposing objectives [2, 3, 4]. These methods model the shape of segmented 
regions in addition to the behavior of pixels in each homogeneous region. The segmentation is then 
computed by estimating the best label for each pixel. 

A number of estimation techniques and region models have been used for the Bayesian seg- 
mentation problem. Typically, the labels of image pixels are modeled as a Markov random field 
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^mkt ; or equivaienuy uidds aismounons pj. inese moaeis are usea oecause iney oniy require 
the specification of spatially local interactions using a set of local parameters. This is important 
since spatially local interactions result in segmentation algorithms which only require local com- 
putations. Most often, the image is then segmented by approximately computing the maximum a 
posteriori (MAP) estimate of the pixel labels. 

These statistical approaches to segmentation provide an important framework, and have im- 
proved results in the application of segmentation to natural scenes [6], tomographic cross sections 
[7], texture images [2], and multispectral remotely sensed images[8, 9, 10, 1 1]. However, the ap- 
proach has a number of important disadvantages. 

Computing the MAP estimate requires the minimization of a discrete functional with many loca 
minima. Exact minimization is intractable, so methods for approximately minimizing the true MAP 
estimate must be used. These methods include simulated annealing [12], greedy minimization [3], 
dynamic programming [4], and multiresolution minimization [13, 14, 15, 16], However, all of these 
approaches require approximations in the two dimensional case, and are either iterative or very 
computationally expensive. 

The MRF model has a limited ability to describe large scale behaviors. For example, we may 
know that segmented regions are likely to be at least 50 pixels wide. However, it is difficult 
to accurately incorporate this information by specifying the interactions of adjacent pixels. The 
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model can be improved by using a larger neighborhood for each pixel, but this rapidly increases 
the number of parameters of interaction, and the complexity of the segmentation algorithms. The 
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fundamental limitation of local models is that they do not allow behavior to be directly controlled 
at different spatial scales. This is of critical importance since scale variation occurs naturally in 
images, and is important in quantifying image behavior [17, 18]. 

The MAP estimate does not have desirable properties for the segmentation problem [19, 20]. 
The MAP estimate minimizes the probability that any pixel in the image will be misclassified. 
This is an excessively conservative criteria since any segmentation algorithm is likely to result in 
some misclassified pixels. In practice, it has been noted that MAP estimation has some undesirable 
global properties which may actually make an approximate minimization more desirable [3, 20]. 
For example, in multiresolution segmentation, MRF correlations parameters were found to increase 
at coarser scales [14, 15]. This is counter to the physical intuition that coarser sampling should 
produce less correlation. 

The maximizer of the posterior marginals (MPM) estimator has been suggested [19] as an al- 
ternative to MAP estimation, since it minimizes the probability of classification error. However, 
it may only be approximately computed in a computationally expensive procedure similar to sim- 
ulated annealing. Also, the MPM criteria does not consider the spatial placement of errors when 
distinguishing among the quality of segmentations. 

Finally, parameter estimation of MRF's is difficult. When parameters are above the "critical 
temperature" there may be no consistent estimator as the image size grows to infinity[21]. Methods 
have been developed to estimate MRF parameters from images being segmented [22], but they are 
computationally expensive. 

In this paper, we attempt to address these difficulties by introducing a new approach to Bayesiai 
image segmentation. This method replaces the MRF model with a novel multiscale random field 
(MSRF), and replaces the MAP estimator with a sequential MAP (SMAP) estimator derived from 
a new estimation criteria. Together, the proposed estimator and model result in a segmentation 
algorithm which is not iterative and can be computed in time proportional to MN where M is 
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the number of classes and N is the number of pixels. We also develop a method for estimating 
the parameters of the MSRF model directly from the image during the segmentation process. This 
allows images with differing region sizes to be segmented accurately without specific prior knowledj 
of their behavior. 

The MSRF model we propose is composed of a series of random fields progressing from coarse 
to fine scale. Each field is assumed to only depend on the previous coarser field. Therefore, the 
series of fields form a Markov chain in scale or resolution. Further, we assume that points in each 
field are conditionally independent given their coarser scale neighbors. This leads to a rich model 
with computationally tractable properties. In fact, Luettgen, Karl, Willsky and Tenney have shown 
in independent work that models similar to the MSRF actually include MRF's as a subclass [23]. 

In earlier work, Chou, Willsky, Benveniste, Basseville, Golden, and Nikoukhah [24, 25, 26, 27, 
have shown that Markov Chains in scale can be used to model continuously valued Gaussian 
processes in one and two dimensions. This work has resulted in fast algorithms for problems such 
as optical flow estimation [29]. Estimation for these models is performed using a generalizations of 
Kalman filtering [26, 27, 28]. This approach is ideal for Gaussian models since the MAP, conditiona 
mean, and minimum mean squared estimates coincide, and may be computed using only recursively 
computed first and second order statistics. However, since our model requires discrete values to 
represent pixel classes, these methods are not applicable. 

The MSRF model has a number of advantages over fixed scale MRF's. The Markov chain 
structure facilitates straight forward methods for parameter estimation since it eliminates difficulties 
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with intractable normalizing constants (partition functions) found in MRF's. Yet the model does 
not impose an unnatural spatial ordering to the pixels since the Markov chain is in scale. Also, 
since explicit parameters are available to control both coarse and fine scale behavior, the MSRF 
model can more accurately describe image behavior. 

The SMAP estimation method results from minimizing the expected size of the largest misclas- 
sified region. This is accomplished by assigning progressively larger cost to errors at coarser scale. 
Intuitively, the criteria accounts for the fact that an error at coarse scale is more grievous since it 
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causes the misclassification of many pixels. The SMAP criteria results in a series of optimization 
steps going from coarse to fine scale. At each scale, the best segmentation is computed given the 
previous coarser segmentation and the observed data. Each maximization step is computationally 
simple and noniterative if the region parameters are known. The complete procedure is reminiscent 
of pyramidal pixel linking procedures [30, 31], but requires local computations much like those used 
in Bayesian networks [32]. 

If the region parameters are unknown, they may be estimated using an iterative procedure at 
each scale. This iterative procedure, based on the expectation maximization (EM) algorithm [33], 
is implemented by subsampling the image. Therefore, parameter estimation only increases the 
required computation by approximately a factor of two. 

Finally, we note the multispectral SMAP segmentation algorithm is available in the Geograph- 
ical Resources Analysis Support System (GRASS) Version 4.1 [34]. GRASS is a public domain 
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geographic information system (GIS). 

Section 2 describes the general structure of our segmentation approach, while Section 3 develop 
the detailed segmentation formulas. Finally, Section 4 applies the algorithm to both synthetic 
images and remotely sensed multispectral images with corresponding ground truth data. 

2 Multiscale Segmentation Approach 

The random field Y is the image which must be segmented into regions of distinct statistical 
behavior. (We use upper case letters to denote random quantities, while lower case letters denote 
the corresponding deterministic realizations.) Individual pixels in Y are denoted by Y 
a member of a two dimensional lattice of points S. 

The basis of our segmentation approach is a hierarchical or doubly stochastic model as shown in 
Fig. 1. This model assumes that the behavior of each observed pixel is dependent on a corresponding 
unobserved label pixel in X. Each label specifies one of M possible classes, each with it own 
statistical behavior. The dependence of observed pixels on their labels is specified through p 
the conditional distribution of Y given X. Prior knowledge about the size and shapes of regions 
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Y - Observed image c< 
four distinct textures 
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X - Unobserved tie Id c 
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the class of each pixel 



Figure 1 : Structure of a doubly stochastic random field used in segmen- 
tation. The behavior of the image (e.g. texture, gray scale, color or 
multi-spectral values) given the class labels is defined by the conditional 
distribution p y | X (y|x). Prior information is contained in the distribution 
of the class labels p(x). 



will be modeled by the prior distribution p(x). 

Since a variety of features can be used with this approach, it is a general framework for the 
segmentation problem. For the texture segmentation problem, a stochastic texture model can be 
used for p y(x (y|x) [4, 35, 15], or texture feature vectors can be extracted at each pixel [36, 37, 38] 
and modeled with a multivariate distribution. However, we will use segmentation of multispectral 
remotely sensed images as the target application for our examples. In this case, each pixel, Y 
be a vector of D spectral components. 

In the following sections, we first describe the general structure of a MSRF model for p(x), 
and we develop a sequential MAP estimation approach for computing the best segmentation. The 
detailed models and recursion formulas resulting from this framework are then derived in Section 3. 

2. 1 Multiscale Random Field Model 

In this section, we develop a multiscale Random Field (MSRF) model which is composed of a series 
of random fields at varying scales or resolutions. Fig. 2 depicts the pyramid structure of the MSRF. 
At each scale, n, the segmentation or labeling is denoted by the random field X 
lattice points is denoted by S (n) . In particular, X (0) is assumed to be the finest scale r 

with each point corresponding to a single image pixel. Each label at the next coarser scale, X 
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n=2 

n=l 

Figure 2: Pyramid structure of the MSRF. The random field at each scale 
is causally dependent on the coarser scale field above it. 

then corresponds to a group of 4 points in the original image. Therefore, the number of points in 
S (1) is 1/4 the number of points in S (0) . 

The fundamental assumption of the MSRF model is that the sequence of random fields from 
coarse to fine scale form a Markov chain. Therefore, the distribution of X (n) gh 

fields is only dependent on X (n+1) . This is a reasonable assumption since X < n 

all the relevant information from previous coarser scales. Formally, this Markov chain relation may 
be state as 

P(X (n) =x (n) |X0> = x (») l> n ) = P(X (n) =x (n) l* (n+1) =x (: 

= p x « |x (D+l ) (x (n) l* (n+1) ). 

Correspondingly, the exclusive dependence of Y on X (0) implies that 

P(Y^dy|X (n) n>0) = P(Y^dy|X (0) ) 

= P y|x<*> (y|x (0) ). 

The joint distribution of X and Y may then be expressed as the product of these distributions 

L-l 

//vv /_\ i /_ • i \ 
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P(Y edy,X = x) = p y|x « <y|x w ) p x « |x cu (x w I* ) p xC 

n=0 

where L is the coarsest scale in X. This Markov structure in scale has the isotropic behavior 
associated with MRF's; but in addition, the causal dependence in scale results in a noniterative 
segmentation algorithm and direct methods of parameter estimation. 

7 
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2.2 Sequential MAP estimation 

In order to segment the image, Y , we must accurately estimate the pixel labels in X. Bayesian 
estimation techniques are the natural approach since we have assumed the existence of a prior 
distribution, p(x). Generally Bayesian estimators attempt to minimize the average cost of an 
erroneous segmentation. This is done by solving the optimization problem 

x = arg min E [C(X,x)|Y = y] 

where C(X,x) is the cost of estimating the true segmentation, X, by the approximate segmentation, 
x. Notice that X is a random quantity whereas x is a deterministic argument. Of course, the 
choice of the functional, C(y), is of critical importance since it determines the relative importance 
of errors. 

In order to understand the deficiencies of the MAP estimate, we will first look at the assumption 
of its derivation. The MAP estimate is the solution to (3) when the cost functional is given by 

Cmap (X,x)=l-5(X-x) 
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where 6(X - x) is 1 when X = x and 0 otherwise. Since C map (X,x) = 1 whenev 

incorrectly labeled, the MAP estimate maximizes the probability that all pixels will be correctly 
labeled. Of course, a segmentation need not be completely accurate at all pixels to be useful. Even 
good segmentations will normally have erroneously classified pixels along region boundaries. This 
is particularly true in high resolution images where the misclassification of a single pixel is not 
significant. Therefore, the MAP estimate can be excessively conservative [19, 20]. 

The implications of the MAP criteria appear even more inappropriate for the estimation of the 
MSRF introduced in the previous sections. The cost function used for MAP estimation of a MSRF 
is 

Cmap (X,x)=1-5(X-x) 

L 

- 1 - 5(X (n) ~ x (n) ) . 

n=0 

8 
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This cost function is 1 if a labeling error occurs at any scale, n, of the segmentation. Conse- 
quently, this function assigns equal cost to a single mislabeled pixel at n = 0 or the mislabeling of 
approximately 256 pixels at n = 4. This cost assignment is clearly undesirable. 

Ideally, a desirable cost function should assign progressively greater cost to segmentations with 
larger regions of misclassified pixels. To achieve this goal, we propose the following alternative cost 
function 

i L 

C SMA p (X,x) = + 2 n_1 C n (X,x) 
i 
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where 

L 

C„(X,x) = l - 5(X (i) ~ x (i) ). 

i=n 

The behavior of C smap is solely a function of the coarsest scale, K, that contains a misclassiJ 
pixel. More precisely, let K be the unique scale such that X ^ =x m ,butl' 

i > K. Then the functions C „ are given by 

„ „ x 1 if n <TK 

Cn(X ' X)= Oifn>K 

and the total cost is given by C smap (X,x) = 2 K . This error at scale K will general! 

the misclassification of a group of pixels at the finest scale. The width of this misclassified group 
of pixels will be approximately 2 K =C S map (X,x)! Therefore, the SMAP cost function 

following intuitive interpretation. 

C smap (X,x) width of the largest grouping of misclassified pixels 

We can determine the estimator which minimizes this proposed cost by evaluating (3). 

x = argmin . ^ E [C smap (X,x)|Y = y] 

L 

= argmin 2 n_1 1 - P(X (i) =x (i) i ^n|Y = y) 



X 

n=0 



L 

= argmax 2 n P(X (i) =x (i) i^n|Y = y) 



X 

n=0 



Since the random fields, X (n) , form a Markov Chain, we will compute this estimate recursn 

in the scale parameter n. This is done by assuming thatx (i) has been computed : 
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using this result to computex {n) . In Appendix A, we show that this recursive approach yiel 

following expression for the solution. 

x (n) =argmax logp x(n) , X(Q+1) (x (n) I* (n+,) ,y) + E(x (n) ) 

x 00 >3 

where £ is a second order term which may be bounded by 

0 ^E(x (n) ) ^max p x „ , x(n) y (x (n_1) l* (n) ,y)« 1 . 

x {a ' l) 

Table 1 gives computed upper bounds for E as a function of scale. (Details of the computation 
are given in the Section 4). For our problem, the approximation that E « 1 is very good. To 
see this, notice that x (n-1) is an interpolation of the coarser segmentation^ (n 
y. Normally, there will be many pixels in the interpolation x (n_1) for which the cc 

uncertain. This is particularly true around the boundaries of objects. Since the number of unique 
labeling combinations for these pixels is enormous, the probability of any particular combination 
will be small. In fact for the models we will use, this probability goes to 0 as the number of pixels, 
N, increases. Therefore, 

lim £( x (n) ) = 0. 

At very coarse scales, the number of labels becomes small, and often only one reasonable 
interpolation will exist (i.e. E ^ 1). However, in this case the correct labeling of pixels at the coarsei 
scale, n, must also be unambiguous, and any reasonable estimator should have good performance. 

Ignoring the contribution of E results in the following recursive equations. 

x W = arg max log p x » | y (x w M 

X (L) 

x (n) = arg max logp x ( n) , x(n+1 > (x (n) I* (n+1) ,y) 

x (0) ,y 

The recursion is initialized by determining the MAP estimate of the coarsest scale field given the 

-1 J J~*~ TT ~* <= 1- C J U ^ *|__ WAF» ^ 



http://scholar.googlexom/schola^ 9/29/05 



A Multiscale Random Field Model for Bayesian Image Segmentation Page 14 of 57 

uusci veu uaia. 1 11c segiiiciiiauuii ai caun niici suaic is uicu luiwu uy computing uic ivi^vr csuniaic 
of X (n) givenx (n+1 * and the image, y. Due to this structure, we refer to this estimator as a 
sequential MAP (SMAP) estimator. 
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Scale Pixel Size Bound on E 
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Table 1 : Upper bound of error term E as a function of scale for three 
512 x 512 images. At moderate and fine resolutions, the correct segmen- 
tation is uncertain and E very small 

By using Bayes rule, the Markov properties of X, and assuming that X c 
tributed, the SMAP recursion may be rewritten in a form which is more easily computed. 

x (L) = arg max log p , » (y|x w ) 

x (n) y| 

x (n) = arg max^ ^ log p y|x « (y|x (n) ) + log p x (D) | x M , (x (n) l x (n+I) 

The two terms in (5) play roles analogous to those of the likelihood function and prior distribution 
in conventional Bayesian estimation. The first term of the maximization, gives the likelihood of 
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ifte observed data y given tne labeling at scale n. me second term 01 tfte maximization embodies 
the a priori information about the behavior of X. Therefore, this term biases the solution to favor 
segmentations with large regions and smooth boundaries. 

The SMAP estimator has a number of additional advantages. In the next section, we will 
introduce specific models so that each optimization step of (5) may be computed with a single 
noniterative pass. This is in contrast to the MAP and MPM estimators which require computa- 
tionally expensive iterative optimization methods [12, 3, 19]. Further, the SMAP estimator has a 
subtle advantage over MPM. The MPM method chooses each pixel individually to minimize the 
probability of error. However, it does not consider the spatial placement of errors. Since the SMAP 
method attempts to minimize the spatial size of errors, it will tend to generate a subjectively more 
desirable segmentation. 

11 
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3 Segmentation algorithm 

In this section, we define the specific models we will use for Y and X. This will be done by specifyir 
the conditional density p y[x (0) (y|x (0) ), together with the coarse to fine scale transition dens 

p x (o) |x (o+i) (x (n) l x (n+!) ). We then develop an adaptive segmentation algorithm which estimates ■ 
parameters of the MSRF during the segmentation process. 

For the multispectral segmentation problem, we restrict ourselves to models with observed 
pixels that are conditionally independent given their class labels. This implies that the spatial 
texture of regions will not be used as a discriminating feature. Instead, we shall rely on the 
multispectral characteristics of classes to discriminate distinct regions. This approach is supported 
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by the fact that spatial con-elation has been found to be weak within regions of multispectral 
images corresponding to a single ground cover [39, 40]. Using this assumption, the conditional 
density function for the image has the form 

P y|x <o) (y|x (0) )= P yJX <o>(y s l x < 0) ) 

where p | X «» (*|k) is the conditional density function for an individual pixel given the class label \ 
Since each pixel is composed of D multispectral components, p , (0) (• |k) is a mul 

function. We will use a multivariate Gaussian mixture density in numerical experiments, but other 
distributions can just as well be applied. 

More generally, the methods used throughout this paper are applicable to any model which can 
be expressed in the form 

logp y|x (o)(y|x (0) )= l s (y|xf } ) + c(y). 

s*S (0) 

where the functions 1 s depend on all of y, and c is an arbitrary function of y. This type of mode 
has been used extensively in texture segmentation applications [4, 41, 35, 15]. 

We will restrict our choice of models for X to have two important properties. First, the pixels in 
X (n) will be conditionally independent given the pixels in X (n+1) . Second, each pix< 

be dependent on a local neighborhood of pixels at the next coarser scale. This set of neighboring 

12 
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locations to s will be denoted by 5s. Based on these properties, the transition distribution from 

aoowo tr\ f!no cr-n\o miict Vtoiro fVvrm 



http://scholar.googlexom/scholar?hl^n&lr=&^ 9/29/05 



A Multiscale Random Field Model for Bayesian Image Segmentation 



Page 17 of 57 



wttiov ivj liiiv ovuiv inuoi utirv inv xuim 

/ v (n)| x (n+l) x_ ^ ✓ (n) | x ( n+l ) \ 

P x (» |X 0*1) (X l A ) - P x ( D ) jx (o+l ) (* S 1 %, ) 

where p oo jx ( n+ o is the probability density for x given its neighbors at the coarser S( 

Xs a, 

These two assumptions assure that the pixels in x (n) will be conditionally indef 

the pixels in x (n+1) . However, we note that nearby pixels in x (n) may still be hi 
since they will share coarser scale neighbors. 

The choice of neighborhood 3s for the pixel s is important since it will define the structure 
of our multiscale pyramid. We will employ two types of neighborhoods. The first corresponds 
to a quadtree structure and allows simple and exact calculation of the SMAP segmentation. The 
second neighborhood corresponds to a more general graph structure and can account for more 
complex interactions that occur across blocks of the quadtree. Therefore, this model produces 
smoother, less blocky segmentations. Unfortunately, the graph structured pyramid does not allow 
exact calculation of the SMAP estimator. Therefore, a hybrid model, which incorporates both the 
quadtree and graph structure, is used to approximate the exact SMAP estimate. 

3.1 Quadtree Model 

The first pyramid structure that we consider is based on a conventional quadtree. The structure of 
the quadtree is shown in Fig. 3a, and the one dimensional analog is shown in Fig. 3b. Since this is 
a tree structure, each point in the pyramid is only dependent on a single point at the coarser scale. 
This coarser scale neighbor of a point s is called the father of s and will be denoted by the function 
d(s). In a similar vein, the n m successive father of a point will be denoted d n ( 

be the set of all points which occur n levels down from s in the tree structure. 

We choose the following transition function to model the probability that X 
given that its father is of class k. 

p.w | X <«i> (m|k) = 0 n,o + 
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(a) (b) 

Figure 3: a) Quadtree structure used for MSRF model, b) One dimen- 
sional analog to quadtree structure. 

where 5 m ,k is the unit sample function and M is the number of possible classes. The parameter 
$ n 0 ^[0,1] is the probability that the labeling will remain the same from scale n+ 1 to n. If a class 
change does occur, it is equally likely to be any one of the remaining class types. At fine resolutions, 
the neighboring pixels are more likely to have the same class. Therefore, 0 ^ > 

increasing function of resolution (decreasing function of n). Notice that this distribution only 
depends on the the scale n through 0 n 0 but does not depend on the particular pixel s. 

An important property of the quadtree structure is that the conditional distribution of Y given 
X (n) has a product form that may be computed using a simple fine-to-coarse recursion. Let y 
to be the set of leaves of the quadtree which are on the branch starting at s eS 
ys (n) = fy r • d n (r) = s). In appendix B, we show that the conditional density for Y given X 
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the product form 

Py i x oo (y|x (n) )= p v « |x « (yi n) K n) ) 



seS (n) 



Furthermore, the density functions p 00 u 00 may be computed using the following recur* 

M is the number of class labels. 



M 



p (o +1) |x (n + .) (y< n+1) l k ) p » . „ (y r (n> l m )P '« , x ( n+ D ( 

red (s) ni=l 

In practice, dynamic range considerations mandate that the logarithm of these functions be com- 
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puted and stored. Therefore, we define the log likelihood function 

l< n) (k) = logp „ |xW (y<"> Ik). 
y$ 1 s 

where the dependence on y is suppressed for clarity. Substituting the transition distribution of (6) 
into (8) and converting to log likelihood functions yields the new recursion 

lf ) (k) = logp y> « (yjk) 
^ 00= log* n , 0 exp/l r (n) (k); + 1 ~*"' 0 M ex P /l< n >( 

red (s) m=l 

to perform the SMAP segmentation, we must compute these log likelihood functions at every 
point in the quadtree. The number of points in the pyramid is approximately given by 
is the number of pixels at the finest scale. Since each of the log likelihood functions may be stored in 

r c\* 1 J ~* : ; /a\ *xn/-> r> *i j 
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inc lumi ui m nuiiiucis, uic luuii rcquircu siuragc is appruxmiaiciy jhiviin;/;>. occausc uic sccuuu 
sum of (9) is not a function of k, each log likelihood function can be computed in time proportional 
to M. Therefore, the total computation time for evaluating the log likelihood functions is 0(MN). 
We note that in the limiting case of 0 n ,o (k) = 1 the recursion reduces to simple avera 

i s (n+,) (k>= _ i^ n) (k) 

red (s) 

This is equivalent to assuming that all the pixels in the block of pixels, y , n 

label. Of course, this is often not the case since some of pixel blocks are likely to fall on region 
boundaries. These blocks then have a mixture of pixels from the two regions. Therefore, simple 
averaging can cause pixels on region boundaries to be misclassified. This is particularly true if 
the statistical average of the two regions appears to have characteristics of a third class. Another 
advantage of the more accurate recursion, (9), is that a small group of anomalous pixels will not 
adversely effect the classification of a large region. Simple linear averaging of the log likelihood 
function tends to be easily biased by a few pixels that are statistical outliers, whereas the more 
accurate recursion is robust to such errors. 

Once the likelihood functions are computed, the SMAP segmentation may be efficiently com- 
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puted using (5).x 

<"> =argmax 0) 1» (x<"> ) + logp (x » |x£ +1 > ) 

This expression is easily evaluated by minimizing with respect to each pixel individually 
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x< n) ^argmax^ l s (n) (k) + log p ^ , x <o +0 (k|x ). 

This coarse-to-fine segmentation operation requires order 0(MN) computation time. Therefore, 
the complete segmentation process consists of a fine-to-coarse and a coarse-to-fine operation, each 
of which require O(MN) computation. 

While the quadtree model results in an exact expression for computing the SMAP segmenta 1 
tion, it does not completely capture some aspects of image behavior. In the following section, we 
introduce an augmented model which improves on the quadtree. 

3.2 Pyramidal Graph Model 

An important disadvantage of the quadtree model is that pixels which are spatially adjacent may 
not have common neighbors at the next coarser scale. Therefore, the model does not enforce 
continuity of region boundaries when they pass across branches of the quadtree. For example, if 
the image is broken into four quadrants at the second level of the quadtree, then a region boundary 
will not be constrained to be smooth as it passes from one of these quadrants to the next. 

This weakness may be corrected by increasing the number of coarse scale neighbors for each 
pixel. Such a pyramidal graph structure is shown in Fig. 4a, and an analogous one dimensional 
graph is shown in Fig. 4b. Notice that each point has three neighbors at the next coarser scale. 

In order to express the positions of these neighbors, we explicitly denote a pixel at scale n as 
s = (ij) where i and j are the row and column indices starting at 0 and ranging through the width 
minus one, and the height minus one respectively. The three neighbors at scale n + 1 may then be 
computed using the function odd(i) = 1 if i is odd and -1 if i is even, and the greatest smaller 
integer function •. 

s, = (i/2,j/2) 

16 
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(a) (b) 

Figure 4: a) Augmented pyramidal graph structure used for MSRF model, 
b) One dimensional analog of pyramidal graph structure. 



s 2 =(i/2 J/2) + (odd(i),0) 
S3 -(i/2J/2) + (0,oddG)) 

The transition function which we choose for this pyramid graph is a natural extension of the 
transition function used for the quadtree based model. 



~Po» | x ( D+ i) (m|ij,k) 
= P(X s (n > = m|X s ( : +,) = i,X <f» =j,X =k) 



= 9a L y WmS +25 mJ +25 m>k ) + 1 

/ M 

Notice that we use the notation ~p to distinguish from the transition distribution used for the 

quadtree model. As with the quadtree model, the parameter 6 M e [0, 1 ] determines 

that the label of the fine scale point will be the same as one of the coarser scale points. Conversely, 
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1 - 0 n,i is the probability that a new label will be randomly chosen from the available labels. 

The disadvantage of the pyramid graph structure is that the likelihood function for the labels 
does not have a product form as was the case for the quadtree in (7). Therefore, there is no simple 
fine-to-coarse recursion with the form of (8) for computing the likelihood of the image, y, given the 
labels, x (n) . 

For computations at a single scale, n, this problem may be circumvented by assuming that the 
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pyramidal 
graph 

n=2 

quadtree 

Figure 5: One dimensional analog to hybrid graph structure. Scales n > 2 
use a pyramidal graph structure, and scales n ^2 use a quadtree structure. 

pyramid has a quadtree structure for scales finer than n, and a graph structure for coarser scales. 
A one dimensional analog to this hybrid pyramid structure is shown in Fig. 5 for n = 2. Notice 
that for levels above n the pyramid is a graph, but below n the pyramid has a simple tree structure. 
Using this hybrid pyramid, the conditional likelihood of (5) then has the computable form 

log P „ <■) |x c-o (V* (n) I* (n+I) ) = 1<"> (x « ) + log ~p x?) |x ^ (X » 
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which results in the following formula for the SMAP estimate of X (n) 

=argmax 1^ (k) 
x< n) =argmax l s (n) (k) + log ~p » |x <-., (k|x ) . 

where ~p is the transition function of (1 1), and 1 ^ (k) are computed using the recursion 

Of course, the application of the above formula at all scales is an approximation to the true 
SMAP segmentation since we may not legitimately change our model during the segmentation 
process. However, we believe that this approximation is reasonable since the likelihood functions 
are primarily dependent on the image data and have only a secondary dependence on the pyramid 
structure. Intuitively, if the pixels in y s (n) appear to be principally from class k, then the li 

function 1 s n) (k) should be relatively large regardless of the pyramid structure used. 

All the following analysis and experimentation will assume the use of this hybrid quadtree-grapl 
structure. As in the case of the quadtree, segmentation using the hybrid structure is performed 
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in two steps each of which only requires order MN computation. The first step is a fine-to-coarse 
computation given by (9). The second step is a coarse-to-fine computation given by (13). Together 
the total computation is of order 0(MN). 

3.3 Parameter Estimation 

In typical applications, one does not have prior information about the exact behavior of the seg- 
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it is being segmented. To do this, we must estimate the parameters 0 n = [6 a 

segmentation process. 

The parameters 6 n ,o are required for fine-to-coarse operations used in computing the log li 
hood functions, and the parameters 6 n \ are required for coarse-to-fine operations used in t 

segmentation. However, we will estimate both of these parameter vectors during the coarse-to-fine 
operations. This means that the segmentation process will require two full passes composed of the 
following steps. 

1. Perform fine-to-coarse operations using initial parameter values 6 n>0 = 1. 

2. Estimate parameters 6 n ,o and 6 n ,i during coarse-to-fine operations. 

3. Perform fine-to-coarse operations using estimated parameters. 

4. Reestimate parameters 8 nJ during final coarse-to-fine segmentation. 

The estimation procedure of 2 and 4 above is performed sequentially at each scale. Each transi- 
tion parameter vector, 6 n , is estimated concurrently with the computation of the segmentation: 
The computational overhead of this estimation procedure is greatly reduced by subsampling the 
image at high resolutions. Since the number of pixels at high resolutions is great, this subsampling 
does not substantially impact on the accuracy of the estimated parameters. 

The two pass process implies that parameter estimation will increase computation by at least a 
factor of 2. However, the additional computation required within each iteration is minimal due to 
subsampling. So the total increase in computation for parameter estimation is generally close to 2. 

19 



Page 20 



http://scholar.googlexonVscho^ 9/29/05 



A Multiscale Random Field Model for Bayesian Image Segmentation 



Page 26 of 57 



We begin by deriving the sequential parameter estimation procedure. The transition parameter 
0 n>1 is estimated by finding the maximum likelihood value given the image, y, and the previous 
coarse scale segmentation^ (n+1) . Formally, we compute the solution to the optimization cr 

0^ eargmax p (0+l) (y|x (n+1) ,6 ^ ) 

where £2 is the set of valid parameter values. Using the hybrid pyramid structure of section 3.2, 
the log likelihood function, L, has the specific form 

L(*n,l ) = l0g P y|x(n+ o (y| X (n+1) ,0 n ,l ) 

M 

log exp(l < n) (k))~p (o) |x(D+ ,) (k|x^ +1) ,0,1 

seS (n) " 

where the dependence of ~p on n is through the parameter 9 n ,i . Notice that ~p uses 

x o } ' X ao ^ t0 em Pl ias i ze that the conditional distribution assumed in (1 1) does not depend on s. 

This likelihood, L{6 n ,i ), may be maximized as a function of 0 n,i by using 

maximization (EM) algorithm [33, 42]. From the form of the function ~p, we know that L(0 
a convex function of 0 n ,i • We will use this fact to show that the EM algorithm is guaranteed t< 
converge to a value of 0 n l which maximizes L. 

To apply the EM algorithm, we must first compute the function 

Q(6 n,i ,0 n ,i ) = E log p y|x » (y|X (n) )p x(a) | X (o +I) (X (n) Ix < n+1 > 9 0 ) |Y = y,X (n: 

where we remind the reader that X (n) is a random object. The EM algorithm is then the fc 

iterative procedure for finding a sequence of parameters which converge to0 n> i 



To further simplify this expression we must formulate a sufficient statistic for the transition dis- 
tribution^ in) , x <o*.) (m|ij,k,0 „,! ) of (11). This statistic counts the number of distinct transi 
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that occur from a particular set of coarse scale neighbors, x , to a point x ! 
may define the sufficient statistic, T, so that 

l 2 

log^p (a) . (D+ i) (m|ij,k,0 n>1 )= T lh (m|ij,k)V 1>h (0 n>1 ) 

0 go 

1=0 h=0 
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where T and V have the functional forms 



T 1>h (m|ij,k) = 



lifS ^ = land 5 mJ + 6^ =h 
0 otherwise 



and 



.V,, h (0 nJ ) = log ^ (31 + 2h) + 1 * nJ . 

7 M 

By substituting (14) into the expression for for Q, we obtain the simpler expression 

1 2 

0$ sargmax T ,, h (<, )V 1>h (5 n ,, ) 

0,1 1=0 h=0 

where 

f,, h 0U.)= ET ,, h (X s (n) Ix^ +1) )| Y = y^ < n+1 > =x < n+l > ,*„,, 

T,, h (k|x^ +,) )exp(l i n) (k))-p x(0) |X(D+1) (k|x£ + 

_ o so 

M n W /I w - n x ( n+! ) /i 

Evaluation of Tj )h is computationally expensive since it requires a summation over all the p« 
inS (n) . However, accurate estimation of 6 n only requires a representative sampling of t 

data. This is particularly true at high resolutions where the number of pixels far exceeds the number 
reauired to accuratelv estimate these parameters. Therefore, we subsample the points in S 
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period P (n) in both the horizontal and vertical directions, and we choose P (n) 
will still have an increasing number of samples at finer scales. Experimental results given in the 
following section will show that subsampling substantially reduces computation without adversely 
effecting the performance of parameter estimation. 

The two steps of each iteration in the EM algorithm are then 

E- Compute T using the parameter 6 £, , sampling period P (n) , and (16). 

M - Compute 8 ^argmax ^ ^ Q(0 n l fi n P , ) using (15). 

The M step can be efficiently computed since it requires the maximization of a convex function Q 
over an interval. This may be done with, for example, the golden section search method [43]. The 
question remains as to whether the algorithm converges to the global maximum of L. To answer 
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this question, we adapt the convergence results of Wu [44], and Redner and Walker [45] to prove 
the following theorem in Appendix C. 

Theorem 1 If i) Q is a compact, convex set, ii) L(0) and Q(0,0 ) are continuous and differentiable 
on an open set containing Q, iii) L(0) is convex, then any limit point,0, of the sequence fd 
the property that 

0 ^ are max L(0) 

Notice that assumption ii) does not strictly hold since Q becomes infinite at the points 0 and 

1 . In practice, this problem can be resolved by estimating 0 ^ , over some small 
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[0 + | ]. In any case, the introduction of i is required for numerical stability. 

Finally, the parameters 0 n ,o are estimated using the statistics T resulting from 

the EM algorithm. 

fl h=o T Uh 

1=0 h=0 1 l,h 

These values are then used in the second pass of fine-to-coarse computations. 

The complete SMAP segmentation algorithm with parameter estimation is summarized below. 

1 . Set the initial parameter values for all n,0 n0 =1, and0 L - {1 = 0.5. 

2. Compute the likelihood functions using (9) and the parameters^ n> o . 

3 . Computex ^ using ( 1 2). 

4. For scales n = L - 1 to n = 0 

(a) Use the EM algorithm to iteratively computed n ,i and T. Subsampk 

computing T, and stop when \8 J^ 1 " ^ * , I < 2 • 

(b) Computed n>0 using (17). 

(c) Computex (n) using (13). 

(d) Set0 n _ u =6 n>l (1-10 2 ) 

5. Repeat steps 2 through 4. 
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Since global convergence of the EM algorithm is guaranteed, the choice of 
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impacts on the accuracy of convergence. We have used the values i = 10 a 

our experimentation, and have never found them to be problematic. Generally, smaller values of 

-4 

2 =10 will lead to better estimates but slower convergence, and we have found that 
We also note that step 4d is used to accelerate convergence of the EM algorithm by starting the 
new parameter values near the previous parameter values. 

4 Experimental Results 

In this section, we compare the performance of the SMAP algorithm with MAP segmentation using 
a MRF prior model for the pixel classes. This is done using a variety of synthetic images. 

All results of the SMAP algorithm used unsupervised estimation of the MSRF parameters. 
Unless otherwise stated, subsampling was used in all experiments, and the subsampling period was 
always chosen using the formula 

P(n) = max/2 x} 

This generally resulted in sufficient sampling to accurately estimate parameters while keeping the 
computational overhead of parameter estimation to a minimum. 

For our comparison to MAP estimation, we choose a conventional 8pt neighborhood MRF 

model [15] for the class labels X. Specifically, the probability distribution has the form 

1 V V 

p x (x)= exp-X 2- It i(x) + t 2 (x)/ 2 

z 

where X = 1 .5, t i is the number of horizontal and vertical neighbors of different class, and t 
number of diagonal neighbors of different class. This appeared to yield the best overall results. 

Since the MAP estimate can not be exactly computed, an optimization method must also be 
chosen. We used simulated annealing (SA) [12], ICM [3], and multiple resolution segmentation 
(MRS) [15]. The ICM and SA methods were started with the maximum likelihood estimate of the 
segmentation. SA used a temperature schedule of the form 
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1 

Tn+l 



+ A. 
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Class designation 

0 1 2 3 4 5 

image 1 u 127.0 145.0 101.6 163.0 76.1 199.0 

a 32 32 32 32 32 32 

image 2 u 127.0 137.1 1 12.7 147.2 98.4 167.5 

a 32 32 32 32 32 32 

image 3 u 127 127 127 127 127 127 

a 8.00 10.55 13.93 18.37 24.25 32 

Table 2: Mean and standard deviation of each texture for three different 
images. 

where T o = 1, T final = .2666 and A was chosen to achieve the desired number of iterations. 
We used both 100 and 500 iterations to compare the performance. After the desired number of 
iterations, ICM was used (equivalently T = 0) to assure convergence to a local minima. These 
annealing parameters were chosen since they seemed to give the best performance over the range 
of test images used. In practice, the choice of annealing parameters must be a compromise since 
the optimal parameters depend on the specific image being process. 

The MRS algorithm differs from ICM and SA since it effectly uses different values of Xat each 
scale. The scale dependent X is used because the MAP estimate would otherwise have unreasonable 
behavior at coarse scales[15]. Intuitively, the MRS algorithm attempts to approximately correct 
the undesirable properties of the MAP estimator by varying the prior model. 

Three 512 x 512 synthetic images shown in Fig. 6 were used to test the SMAP and MAP 
aleorithms. Each contained 6 white Gaussian textures with distinct means and variances as shown 
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in Table 2. In each case, the background is class 0, and the circles are numbered in order starting 
from the top. The radius of each subsequent row of circles was chosen to decrease by a factor of 2. 

Fig. 7, 8, and 9 show the result of segmenting the three test images using the SMAP and MAP 
algorithms. The percentage of misclassified pixels is tabulated for each class in Table 3, and the 
computational requirements are shown in Table 4. For comparison, each table also lists the results 
of SMAP segmentation without pixel subsampling, and the results of the MRS algorithm. For 
these test images the performance of SMAP with and without subsampling was not significantly 
different. However, subsampling reduced computation by approximately a factor of 3. Table 5 gives 

24 
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the specific values of the estimated parameters at each scale. It also supports the conclusion that 
subsampling does not significantly degrade the accuracy of the estimates. 

Inspection of the segmented test images indicates that the SMAP segmentation performed com- 
parably to or substantially better than 500 iterations of SA. The SMAP algorithm appears to make 
fewer large misclassification errors. This may be due to the improved objective of minimizing the 
largest misclassification. We also note that the less smooth boundaries of the SMAP segmentation 
sometimes resulted in slightly higher misclassification rates for the smooth shapes in these synthetic 
images. However, the less smooth boundaries may actually be more desirable for segmenting the 
natural shapes of real images. Generally, the SMAP segmentation required less computation than 
even the worst performing algorithm, ICM, and much less computation than SA. 

Inspection of Fig 7 indicates that, for the parameter value of X= 1.5, the MAP segmentations 
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tend to produce smoother boundaries than the SMAP algorithm. Since the circular regions of this 
synthetic image have very smooth boundaries, this results in slightly better classification accuracy 
for the 500 iteration MAP segmentation. However, the 100 iteration MAP segmentation has still 
not completely converged. Notice that along boundaries groupings of pixels form which are mis- 
classified into a third class representing the average behavior of the two classes. Also, scattered 
misclassification of individual pixels in the MAP segmentations result from the inability to control 
small scale behavior independently of large scale behavior. The ICM algorithm performed very 
poorly. 

Fig 8 shows that for image 2 the overall performance of the SMAP algorithm is substantially 
better than 500 iterations of SA for all classes but 4. As before, false average classes formed at 
region boundaries for the MAP segmentation. For this image, both the 100 iterations of SA, and 
ICM failed to properly segment the image. 

Fig 9 is distinct because the mean of each region is equal, and classes are only distinguished 
by their variance or texture. In this example, the SMAP algorithm produces better results than 
the alternative methods. The largest regions are completely absent in the 500 iterations of SA. 
This may be due to the suboptimal solution of SA, or the nature of MAP criteria. Also, the 100 
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image 1 



Class label 
0 12 3 

SMAP (subsampling) 100% 95% 96% 94% 93% 78% 

SMAP (no subsampling) 100% 95% 96% 94% 93% 78% 

SA 500 100% 97% 97% 96% 95% 96% 



4 



5 
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ICM 
MRS 

image 2 SMAP (subsampling) 
SMAP (no subsampling) 
SA500 
SA 100 
ICM 
MRS 

image 3 SMAP (subsampling) 
SMAP (no subsampling) 
SA500 
SA100 
ICM 
MRS 



yyyo yjyo yoyo yDVo iovo yoyo 
60% 3 1% 62% 74% 94% 96% 
100% 96% 96% 95% 92% 95% 
99% 94% 93% 86% 53% 70% 
99% 94% 93% 87% 60% 67% 
100% 81% 44% 59% 69% 74% 
74% 57% 10% 32% 74% 77% 
23% 12% 6% 14% 97% 96% 
99% 93% 92% 74% 63% 72% 
100% 95% 95% 92% 61% 58% 
100% 94% 95% 91% 60% 60% 

100% 0% 94% 92% 18% 75% 
100% 0% 40% 49% 52% 62% 
100% 0% 1% 14% 14% 62% 
100% 95% 96% 80% 52% 57% 



Table 3: Percentage of each class label which was correctly classified, and 
class averages of classification accuracy. 

iterations of SA have not converged. 

Table 4 shows the number of replacement operations required per image pixel for each method. 
For the SMAP algorithm, replacement operations include either of the following two basic opera- 
tions: evaluation of a pixel's class using (13), or evaluation of the expectation term in (16) at a 
single pixel. All of these replacement operations are comparable since they each require order M 
operations. 



SMAP (no 
estimation) 

imagel 1.33 
image2 1.33 
image3 1.33 



SMAP (sub- 
sampled esti- 
mation) 
3.13 
3.55 
3.14 



Replacements per pixel 

SMAP (no 
subsampling) 

9.12 
10.47 
8.15 



SA 500 SA100 ICM 



504 
506 
505 



105 
108 
104 



Table 4: Number of replacement operations required per image pixel for 
the three synthetic test images. The SMAP algorithm is listed with and 
without parameter estimation. 
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image 1 
image 2 
image 3 



image 1 
image 2 
image 3 



Subsampling 
No Subsampling ( 

Subsampling 
No Subsampling ( 

Subsampling 
No Subsampling ( 



Subsampling 
No Subsampling ( 

Subsampling 
No Subsampling ( 

Subsampling 
No Subsampling ( 



Estimated Parameters 0-3 

(#0,0 £ 0,1 ) (0 1,0 £ 1,1 ) (0 2,0 fi 2,1 ) 

( 0.9872,0.9981) ( 0.9815,0.9990) ( 0.9718,0.9953) ( 0.9607,1.01 
0.9876,0.9980) ( 0.9833,0.9989) ( 0.9723,0.9974) ( 0.9533,0.9952) 

( 0.9835,0.9945) ( 0.9749,0.9954) ( 0.9677,0.9932) ( 0.9523,1.01 
0.9851,0.9961) ( 0.9784,0.9971) ( 0.9688,0.9963) ( 0.9531,0.9936) 

( 0.9866,0.9989) ( 0.9769,0.9998) ( 0.9629,0.9963) ( 0.9175,1.0< 
0.9860,0.9989) ( 0.9773,0.9998) ( 0.9594,0.9979) ( 0.9197,0.9947) 

Estimated Parameters 4-7 

(# 4,0 fi 4,1 ) (# 5)0 ,6 5,1 ) {0 6,0 fi 6,1 ) 

( 0.9124,0.9899) ( 0.9132,0.9714) ( 0.9444,0.9326) ( 1.0000,1.01 
0.9124,0.9900) ( 0.9132,0.9712) ( 0.9444,0.9325) ( 1.0000,1.0000) 

( 0.9126,0.9852) ( 0.9004,0.9745) ( 0.9444,0.9328) ( 1.0000,1.01 
0.9126,0.9852) ( 0.9004,0.9745) ( 0.9444,0.9329) ( 1.0000,1.0000) 

( 0.7728,0.9823) ( 0.7214,0.9804) ( 0.6401,0.9329) ( 0.2500,1.01 
0.7728,0.9823) ( 0.7214,0.9804) ( 0.6401,0.9329) ( 0.2500,1.0000) 



Table 5: Effect of subsampling on parameter estimation. 



For each test image, the SMAP algorithm with parameter estimation (and subsampling) re- 
quired considerably less computation than ICM and much less than SA. The addition of parameter 
estimation to the SMAP algorithm increased the computation by approximately a factor of 2. Im- 
age 3 is the worst case with an increase of 2.66. We note that this measure of computation does 
not include the calculation of likelihood functions. 

In order to test accuracy with real data, a multispectral remotely sensed image was segmented 
and the results were compared to measured ground truth to determine classification accuracy. 
Fig. 10 shows a typical 430 x 400 subregion of a 927 x 1097 three band SPOT image with a spatial 
resolution of 20m. The SPOT image is displayed in color with the infrared band as red, the visible 
red band as green, and the green band as blue. Ground truth information was collected along 
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90 positions (transects) placed randomly throughout the full image [46] (only a subset of these 
transects are contained in the displayed subregion). Each transect is 100m long (approximately 5 
pixels) with a random orientation. Along each transect, detailed measurements of ground cover 
where made, but for this experiment we only consider five classes based on the aggregate measure oJ 
percent bare ground for each transect. 60 of the transects were randomly chosen to use for training 
the class models, and the remaining 30 were used to test the segmentation algorithms performance. 

27 
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Each of the five classes was modeled as a multivariate Gaussian mixture. The parameters of 
the mixture model were estimated using the EM algorithm [45] and the model order was chosen 
for each class using the Rissanen criteria [47]. The mixture model is important because it captures 
the multimodal spectral behavior that is typical of most classes. 

Subregions of the SMAP, SA, ICM and maximum likelihood segmentations are shown in Fig. 1 
Only 100 iterations of SA were used due to the excessive computation time required to process the 
large image. The percent misclassification was also computed using the 30 testing transects for 
each class. The results are tabulated in Table 6. For each class and algorithm, the average region 
size was also computed. 

The classification accuracy of the SMAP segmentation was substantially better than the maxi- 
mum likelihood algorithm and slightly better than ICM. The SMAP and SA algorithms had approx- 
imately comparable accuracy with SMAP performing 5.3% better for class 2 and SA performing 
2.5% better for class 5. Inspection of the segmented images in Fig. 11, indicates that the SA al- 
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gorithm produced smoother boundaries than the SMAP result. This extra smoothing of the SA 
algorithm tends to remove the smaller regions which are more common in class 2. 

It is also interesting that the SMAP segmentation produces the largest average region sizes. 
This is because the SMAP segmentation produces fewer regions containing very few (e.g. 1 or 2) 
pixels. 

5 Conclusion 

We have presented a new criteria and model for statistical image segmentation. The SMAP esti- 
mator is proposed because it minimizes the expected size of the largest misclassified region, and 
it results in a segmentation algorithm which is computationally simple. The MSRF model uses a 
pyramid structure to capture the characteristics of image behavior at various scales. Because the 
MSRF has a Markov Chain structure, its parameters can be estimated efficiently from the image 
during segmentation. 

Experiments with synthetic data indicate that the SMAP algorithm performs comparably to 
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Class 1 Class 2 Class 3 Class 4 Class 



Percent Bare Ground 




0% 


1-10% 


11-21% 21-30% 31-100% 


Accuracy 


SMAP 


88.1% 


27.8% 


17.5% 


27.5% 


93.5? 




SA100 


88.1% 


22.5% 


17.5% 


27.5% 


96.0? 




ICM 


87.5% 


23.2% 


17.5% 


27.5% 


93.5? 




ML 


78.5% 


26.5% 


17.5% 


20.0% 


85.1? 


Average Region Area SMAP 




131.7 


18.7 


10.7 


21.2 


57.7 




SA100 


126.9 


16.4 


7.5 


14.7 


49.9 






i i 








M M IS 
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1UVI 11Z.Z O.Z I.L iz.y <¥\.l 

ML 32.1 3.9 3.9 4.7 16.1 



Table 6: Tabulated results for the segmentation of multispectral SPOT 
data with ground truth. Classes were formed from ranges of percentage 
bare ground, and percent classification accuracy was tabulated for each 
class. The average region size for each class is also listed. 

or substantially better than MAP estimation using a Markov random field model and simulated 
annealing. In addition, the SMAP algorithm requires less computation than ICM and much less 
then simulated annealing. The SMAP algorithm was also tested on multispectral SPOT data and 
found to improve segmentation accuracy over ML segmentation. 
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A Appendix 

Assume thatx (l) has been computed for i > n. We will then computex (n) using 

structure of the random fields X (n) . 



x (n) = arg max max 2 k P(X (i) = x (i) i ^ k|Y = y) 

x« x® i-n M 

L 

= arg max max max 2 k p(Y edvX 0) =x (l) i^k) 

x (D) x (i) i<n x<° i>n ^ J 



= arg max max 

X W x 0) i< n 

n 

2 k P(Y^dy,X ( ° -x (i) k^i^n|X (i) =x (i) i>n)P(X (i) 

k=0 
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L 

+ 2 k P(Y ^dy,X (0 (i) i>k) 

k=n+l 

n 

argmax max 2 k p(Yedy,X ® = x (i) k <"n|X (n+1) =x 

x (n) „ 0) : <n 

x k=0 



= argmax P(Y edy,X (n) =x (n) l X (n+1) =x (n+1) ) 

X (n) 

n-1 

+ max 2 k " n P (Y^dyX (0 =x (i) k ^n|X (n+,) =x (n+,; 

*® i<n ^ 

We next define a residue term, R(x (n) ), so that the following equality holds. 

x (n) =argmax P(Y edy,X (n) =x (n) I* (" +1 > =x (n+1) )(1 + R(x 
= arg max p x(D) , x(n+I) (x (n) I* (n+1) ,y)(l + R(x (n) )) 

x W * 7 

Specifically, R(x (n) ) is given by 

R(x (n > ) = max " « ^ P(Y edy ' X " = * " k ^ ^ |X ^ "* 
x (o i<n P(Y^dy,X (n) =x (n) l x < n+1 > =x < n+1 ) ) 

Since this expression is the ratio of positive quantities, we know that R ^ 0. Further, we may 
bound R from above as follows 

n-l 

R(x (n) )= max 2 k ~ n P(X (i) =x (i) k ^i ^n- 1|X (n) =x (n) 

x( ° i<n k=o 

n-l 



max 2 k ~ n P(X (nH) =x (n-1) l x (n) =x (n) Y = y) 

(a-l) ~ " > y) 
k=0 

^max P / X (n_1) = x (n_1) |X (") =x (") y = v) 

x <-l) 

= max p x(0 -o , X(D) y (x (nH) l* (n) ,y) 



Finally, we have that 



x (n) = arg max log p x « n) , x (n+1) >y (x (n) I* 0»» , y ) + i og (l + R( x W ) 

= aro may W n ™ i» ~n <v (n) l x (n+1) v^ + R(x (n) ^ 
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where 

0 5T£(x (n) ) = log(l + R(x (n) )) - 1 
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max p x (-> |x oo v (x (n_1) l x(n) ,y) 



x (.-d 



B Appendix 

In this appendix, we show by induction that for a quadtree based pyramid structure the distribution 
of Y given X (0) has the form of (7) and that the terms in the product may be computed using the 
recursion of (8). For n = 0, these relations are true by assumption. So, assuming the result for 
scale n yields 

P(Y*dy|X (n+,) =x( n+1 >)= P(Y<=dy|X (n) =x (n) )p x ,> „ M (x (n) l x(n+ 

x<°> 

p yt c» | X (.) (y r (n) l x J n) ) p x <»> , x 

M 

p <., |xW (y<"> l x [ n) )P x o | X <-) (x| n) I 

reS » xj 0 ' =1 

M 

Py « |x oo (y< n) l x r n) )p x ,> |x 

seS (D * n red " (s) x <°> =, 

I ( n+ 0 lx("+l) \ 

p w *»o | X (^.) (y^ l x ^ ) 
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where 

M 

P !x cn+i) (y s (n+1) l k ) = p (n) | X ( B ) (y* n) l m )P (n) . (o+i) (n 

red ~ (s) m=1 

C Appendix 

In this appendix, we prove theorem 1 by extending basic results on the convergence of the EM 
algorithm [44, 45]. 

By the stated assumptions and theorem 4.1(v) of [45], any limit point,0, of the sequence {0 
has the property that0 ^arg max QflW- Since Q is convex and Q is differentiable, thi 

that for all 0 efi 

D 1>0 Q(0,0)(0 - ~ 0) <T0 
where D 1>0 computes the gradient of Q with respect to the first argument. 
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Let 2 o bean open set containing G such that Q and L are continuous, differentiable on Q 
Then define the continuous, differentiable function 11(0,0) = Q(0,0) - L(0) on Q 
shown [33] that H has the property^ ^arg max eeQ, o H(0,0) which implies that 

DL(0) = D ! '° Q(0,0) + D uo H(0,0) 

-D 1,O Q(0,0). 

Therefore, for any 0 eQ, we have 

DL0)(0-- 0) = D ''° Q(0,0)(0 - - 0)^0. 
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Since L is a convex function and Q is a convex set, this implies thatfl is a global maximum of L. 
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Figure 6: a) Image depicting the classes in synthetic test images. Each 
of the 6 classes is indicated by a distinct gray level. Three synthetic 
test images, b) image 1, c) image 2, and d) image 3. Each region is 
distinguished by its mean and variance. 
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Figure 7: Segmentations of image 1. Each color denotes a different class, 
a) SMAP, b) MAP with 500 iterations of SA, c) MAP with 100 iterations 
of S A, and d) MAP with ICM. 
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Figure 8: Segmentations of image 2. Each color denotes a different class, 
a) SMAP, b) MAP with 500 iterations of SA, c) MAP with 100 iterations 
of SA, and d) MAP with ICM. 
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Figure 9: Segmentations of image 3. Each color denotes a different class, 
a) SMAP, b) MAP with 500 iterations of SA, c) MAP with 100 iterations 
of SA, and d) MAP with ICM. 
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Figure 10: A 430x400 subregion of a Multispectral remotely sensed SPOT 
image. Ground truth measurements were taken at 90 positions (transects) 
located throughout the full image. Each transect is approximately 5 pixels 
in size. 
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Figure 1 1 : Segmentations of multispectral remotely sensed SPOT image 
for subregions of size 430 * 400. a) SMAP segmentation, b) MAP with 
100 iterations of SA, c) MAP with ICM, and d) Maximum likelihood 
segmentation. For each image class 1 -* black; class 2 -* red; class.3 -* 
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green; class 4 -* blue; class 5 -* white. 
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